Structural characterisation of hemagglutinin from seven Influenza A H1N1 strains reveal diversity in the C05 antibody recognition site

Influenza virus (IV) causes several outbreaks of the flu each year resulting in an economic burden to the healthcare system in the billions of dollars. Several influenza pandemics have occurred during the last century and estimated to have caused 100 million deaths. There are four genera of IV, A (IVA), B (IVB), C (IVC), and D (IVD), with IVA being the most virulent to the human population. Hemagglutinin (HA) is an IVA surface protein that allows the virus to attach to host cell receptors and enter the cell. Here we have characterised the high-resolution structures of seven IVA HAs, with one in complex with the anti-influenza head-binding antibody C05. Our analysis revealed conserved receptor binding residues in all structures, as seen in previously characterised IV HAs. Amino acid conservation is more prevalent on the stalk than the receptor binding domain (RBD; also called the head domain), allowing the virus to escape from antibodies targeting the RBD. The equivalent site of C05 antibody binding to A/Denver/57 HA appears hypervariable in the other H1N1 IV HAs. Modifications within this region appear to disrupt binding of the C05 antibody, as these HAs no longer bind the C05 antibody by analytical SEC. Our study brings new insights into the structural and functional recognition of IV HA proteins and can contribute to further development of anti-influenza vaccines.

Influenza virus (IV) is one of the most important viral pathogens impacting human health and the global economy 1 . It is estimated that around 10% of the global population is infected annually, with an economic burden of more than $87 billion (USD) on the healthcare system 2 . It is estimated that each year IV causes around 300-600 thousand deaths worldwide 3 . Humans have experienced several influenza pandemics in the last century. The most severe include Spanish (1918)(1919), Asian (1957Asian ( -1958, and Hong Kong (1968)(1969)(1970) influenza, that cumulatively caused around 100 million deaths. Most recently, the 2009-2010 influenza pandemic (swine flu) resulted in 284,000 deaths 2,4 .
IV is a member of the Orthomyxoviridae family of segmented, negative-sense, single-stranded RNA viruses. There are four genera of IVs, A (IVA), B (IVB), C (IVC), and D (IVD). The genomes of IVA and IVB consist of eight RNA segments while the genomes of IVC and IVD consist of seven RNA segments, enveloped by a phospholipid bilayer derived from the host membrane. These segments encode for a variety of structural and non-structural proteins 5 . In IVA and IVB, two of these structural proteins, hemagglutinin (HA) and neuraminidase (NA), are inserted into the phospholipid bilayer as spikes 6 . HA is responsible for viral attachment, entry, and fusion into host cells, while NA cleaves the cell receptor to facilitate viral release 1 . In IVC and IVD, there is only a single spike protein, hemagglutinin-esterase-fusion (HEF), which is responsible for both viral attachment and release 7 .
In IVA there are 18 HA subtypes, phylogenetically divided into two groups. Mammalian IVA infection is initiated through HAs binding to α2,6 sialic acid-linkage galactose receptors on the host cell surface, before entering www.nature.com/scientificreports/ the cell via endocytosis 1 . Following endocytosis, HAs undergo conformational rearrangements due to the low pH of the endosome, leading to membrane fusion 8,9 . Influenza HA is synthesised as an immature precursor (HA0) that must be proteolytically cleaved into HA1 and HA2 polypeptides for the activation of membrane fusion, and a recent study characterized the structural differences of HA0 at neutral and low pH compared to HA1/HA2 at low pH 10 . Notably, the most commonly observed HA structure in the public database is the cleaved HA1/HA2 assembly at neutral pH. HA is a homotrimeric glycoprotein, with each monomer composed of two regions: (1) the head, comprising the receptor binding domain (RBD) which is crucial in viral attachment, and (2) the stalk, that is responsible for viral and cell membrane fusion in the endosome 11,12 . While the head contains a sialic acid binding pocket and plays the key antigenic role 12,13 , it has been shown that several antibodies can also recognise antigenic motifs on the stalk 14 .
The stalk is now being investigated for vaccine design as it is highly conserved in both IVA and IVB 15 . Constant antigenic drift causes a hypervariable area on the RBD, thus influenza vaccines targeting this area are inefficient. However, some anti-influenza antibodies, such as C05 and F045-092, have overcome this variation and can bind several IV strains within and across subtypes 14 . Some stalk-binding antibodies can also neutralise a wide range of IV strains. For instance, FI6v3 can neutralise all IVA strains and CR9114 can neutralise all IVA and IVB strains 14,16 . These conserved epitopes in the RBD and stalk look promising for designing a universal vaccine with higher effectiveness against circulating pathogenic IV strains. Recent advances in computational antigen design have generated new avenues for vaccine development against multiple IVA and IVB strains [17][18][19] .
Understanding HA structural variation across IVA and IVB strains is critical for antibody therapy and vaccine design, especially the hypervariability in the RBD which can lead to ineffective immune responses. As such, we characterised the crystal structures of HA from six H1 subtype IVA strains, with variations in the RBD. In addition, we determined the crystal structure of A/Denver/57 HA in complex with the anti-influenza antibody C05 fragment antigen binding (Fab). The structure and sequence of all seven HAs were then analysed to highlight similarities and differences. This study provides further insights into HA spike proteins from one of the most critical IV subtypes and expands our knowledge of HA structure and antibody binding.

Materials and methods
Cloning, expression, and purification of C05 Fab. C05 Fab was expressed and secreted from mammalian HEK293 cells (ThermoFisher; catalog R79007) in FreeStyle 293 media with PEI induction. The expressed protein construct contains a C-terminal non-cleavable His6 tag on the heavy chain. The protein was initially purified by nickel affinity chromatography on a HisTrap nickel Excel column in 20 mM Tris pH 8.0, 250 mM NaCl (Buffer A) supplemented with 500 mM imidazole Buffer B) with a 5-60% Buffer G linear gradient. The eluted protein was concentrated from ~ 0.1 to 1.26 mg/mL, then further purified by size exclusion chromatography on a Superdex 200 in 25 mM Tris pH 8.0, 150 mM NaCl. C05 Fab protein was then concentrated to 9.94 mg/ mL and stored at − 80 °C prior to crystallisation.
Cloning, expression, and purification of IVA HAs. Full length IVA H1N1 HAs were cloned from synthetic genes into vectors for expression in baculovirus infected insect cells (obtained from Expression Systems: website https:// expre ssion syste ms. com/ produ cts/ cells) using a general expression and purification protocol described previously 20 . All IV open reading frames were cloned into a pBac vector with ampicillin resistance, encoding an N-terminal GP67 secretion sequence (sequence: MVLVNQSHQGFNKEHTSKMVSAIVLYVL-LAAAAHSAFAGS) and a C-terminal thrombin-cleavable trimerisation domain followed by a His6 affinity tag (sequence: EFLVPRGSPGSGYIPEAPRDGQAYVRKDGEWVLLSTFLGHHHHHH). IVA HAs were expressed in baculovirus infected T. ni insect cells (obtained from Expression Systems: website https:// expre ssion syste ms. com/ produ cts/ cells) (3% virus) using ESF-921 media and were harvested after 2 days. Media containing the secreted protein was buffer exchanged into 20 mM Tris pH 8.0, 150 mM NaCl and concentrated from 5 to 1 L using a Tangential Flow Filtration system. The protein was purified via nickel affinity chromatography on a Ni HiTrap chelating column. The eluted HA0 protein was treated with trypsin to generate mature HA (HA1 and HA2 polypeptides) and to remove the C-terminal trimerisation domain with His6 affinity tag. The protein was next purified by subtractive nickel affinity chromatography followed by size exclusion chromatography using a Superdex 200 column equilibrated with 25 mM Tris pH 8.0, 150 mM NaCl. The protein was concentrated to ~ 10 mg/mL and stored at − 80 °C prior to crystallisation.
The head of A/Fort Monmouth/1/1947 HA was cloned from a synthetic gene into a pET28a vector with kanamycin resistance, encoding an N-terminal thrombin-cleavable His6 tag (sequence: MGSSHHHHHHSS-GLVPRG). The head domain was expressed in BL21(DE3) E. coli BL21(DE3) cells (from New England Biolabs, catalog number C2527H) in TB media overnight at 25 °C with 1 mM IPTG induction. Cells were harvested and lysed by sonication, then clarified in 250 mL of 50 mM Tris pH 8.5, 200 mM NaCl, 1 protease inhibitor tablet, 100 mg Lysozyme, 250 U Benzonase. Triton X-100 was added to 0.5% final concentration. The inclusion bodies were washed twice with 50 mM Tris pH 8.5, 200 mM NaCl, 10 mM EDTA, 10 mM beta mercaptoethanol, 1% Triton X-100, then twice with 50 mM Tris pH 8.5, 200 mM NaCl, 10 mM EDTA, 10 mM beta mercaptoethanol. The inclusion bodies were solubilised in 6 M guanidinium-HCl, 50 mM Tris pH 8.5, 150 mM NaCl, 10 mM beta mercaptoethanol. The protein was purified via nickel affinity chromatography on a HisTrap nickel Excel column and then refolded by dialysis over 2 days, initially with 50 mM Tris pH 8.5, 0.4 M l-arginine, 1 mM EDTA, 150 mM NaCl, 1 mM oxidized glutathione (GSSG), 1 mM reduced glutathione (GSH), then with 25 mM Tris pH 8.5, 150 mM NaCl. The His6 tag was removed by incubation with thrombin for 4 h on ice. The refolded protein was further purified by size exclusion chromatography on a Superdex 200 column in 50 mM Tris pH 8.5, 150 mM NaCl. The protein was concentrated to 5.04 mg/mL and stored at − 80 °C prior to crystallisation. Yield was 2.1 mg from 1.86 L of media.  . 3).
Although we were unable to solve the structure of the A/Denver/57/1957 trimer in isolation, we solved a 2.92 Å resolution structure of this HA bound to the head-binding antibody C05 Fab. A schematic overview of the various HA domains and their respective structures are shown in Fig. 1A,B, and here we use the structure of HA from A/Denver/57/1957 to illustrate the structural features. The immature, precursor form of HA (HA0) is 565 amino acids long and is cleaved into two main segments (HA1 and HA2) by trypsin-like proteases [27][28][29][30][31] . These HA1 and HA2 domains remain bound through an extensive network of interactions involving a disulfide bond between two conserved cysteines in the cleaved segments (Cys21 in the HA1 N-terminal domain and Cys480 in the HA2 C-terminal domain; Fig. 1A), in addition to 32 hydrogen bonds, 6 salt bridges, and a buried surface area of 2526 Å 2 ( Fig. 1A; Supplementary Fig. 2). HA1 has three main domains. The fusion domain (red) comprises N-terminal (1-59) and C-terminal (278-325) segments of HA1 that are bound together to form part of the stalk. The vestigial esterase (VE) domain (yellow) also comprises both N-terminal (60-112) and C-terminal (266-277) segments of HA1, but forms part of the head domain (Fig. 1A). The function of the VE domain in IVA and IVB is not well defined, however it does share 54% homology with the 9-O-acetylesterase domain in IVC ( 32 ) that cleaves the host receptor to facilitate viral budding 33 . The RBD (green) is located in the middle of HA1 (113-265) and is responsible for binding to sialic acid receptors, as well as mediating the release of viral ribonucleoprotein particles (vRNPs) into the cytoplasm 34,35 . HA2 is comprised of two main domains (cyan), including a long ectodomain (335-500) that is responsible for viral fusion and a C-terminal transmembrane (TM) anchor at the end. Collectively, the RBD and VE domain comprise the head, and the fusion domain, ectodomain, and TM anchor comprise the stalk of HA. Proteins, Interfaces, Structures and Assemblies (PISA) 36 analysis predicted the biological assembly to be a trimer for all seven structures (Fig. 1C), however the number of interfacing residues was different between each ( Supplementary Tables 4-10).
The stalk consists of the HA1 fusion domain, the HA2 ectodomain, and the TM anchor. The TM anchor is responsible for fusing to the endosomal membrane of the host cell 37 . HA1 and HA2 are linked via two Cys residues (Cys21 and Cys480) on the N-terminal of HA1 and C-terminal of HA2, as well as several hydrogen bonds ( Supplementary Fig. 2).
HA sequence alignment of the seven IVA strains (Fig. 3A) reveals greater conservation (highlighted red) in the stalk than the head. Importantly, the four sialic acid binding pocket residues are completely conserved in all strains (Fig. 3A, green circles). As highlighted through a sequence conservation mapped onto the HA structure (Fig. 3B), the majority of non-conserved residues are within the RBD (Fig. 3A, green line), which has the potential to affect the efficacy of head-binding antibodies against different IV strains. Antigenic motifs in H1N1     20 . PISA analysis of the HA:C05 complex revealed that seven HA residues mediate the interaction between the HA RBD antigenic motif and the C05 heavy chain, while only two HA residues mediate binding to the C05 light chain (Fig. 4A,B). Residues within the HA that mediate binding to the heavy chain include Tyr108, Thr144, Arg146, Val148, Ser158, Glu203, and Ser240, whilst HA residues interacting with the light chain include Arg18 and Thr20. Comparing the structure of A/Denver/57 HA in complex with C05 Fab to our six other HA structures showed differences in the C05 antibody binding site (Fig. 4C). The only residue conserved in all seven structures is Tyr108, which is also one of the conserved sialic acid binding pocket sites. Thr144 is not conserved in any other structures and mutated to either Asp in A/Jiangsu/ALS1/2011 (PDB ID: 6D8W), Ser in the A/Melbourne/1/1946 strain, and Asn in the other strains. Arg146 was conserved with either a Lys or Arg in all strains except A/Hickox/ JY2/1940, where an Ile residue was present. Val148 and Ser158 were both highly conserved, with one exception being A/Jiangsu/ALS1/2011, where these residues were Thr and Asn, respectively. Glu203 were conserved with negatively charged Asp/Glu residues with the exception of A/Hickox/JY2/1940, that contained an Asn. Finally, for Ser240, the most notable amino acid change was in A/Almatu/32/1998, which harboured a Pro at this position.
Insertion between 146R-G147 influences C05 antibody binding. The antibody C05 Fab failed to bind many of our other IVA HAs, as examined by analytical size exclusion chromatography (see data below). Failure to bind correlated with an insertion in HA between 146R-G147 of the epitope, whereas sequences without an insertion bound the antibody. For example, both A/Denver/57 (PDB ID: 6ML8, sequence: WPNHTTR/ G147; Supplementary Fig. 4) Supplementary Fig. 7) (Fig. 5B).

Discussion
Antigenic drift, an evolutionary accumulation of amino acid substitutions in antigenic proteins determined by host adaptive immune systems 43 , is one of the key mechanisms used by viruses to avoid recognition by the host immune system. Antigenic drift can also increase viral attachment to its host receptor. For this to happen, viral surface proteins undergo high selective pressure during their evolution 44 . HA and NA are spike proteins found on the surface of IVA and IVB virions, with HAs the main target of the humoral immune system. In humans, there are two main categories of broadly neutralising antibodies (bnAbs) that target IV HAs, those that bind to either the head (specifically the RBD) or the stalk. Due to the high sequence conservation within the stalk, stalk-binding bnAbs such as FI6v3 and CR9114 can neutralise a wide range of IV HAs. Conversely, the neutralisation ability of head-binding bnAbs is much narrower due to the hypervariable sequence of the RBD 14 . C05 employs a unique approach, demonstrating a long complementarity-determining region (CDR) to bind the RBD, minimising contact with the hypervariable sequence 20 . This allows C05 to bind diverse HAs from H1, H2, H3 and H9 viruses 45 .
In the present study, we characterised the sequence and structure of HA from the H1N1 influenza strain A/ Denver/57 bound to the bnAb C05 Fab and compared it to six additional H1N1 IVA HAs. Our analysis of all www.nature.com/scientificreports/ seven HA structures revealed that they adopt similar conformations and retain conserved putative sialic acid binding site residues, as seen in previously described IV HA structures. While further experiments are necessary to confirm the sialic acid binding site in our structures, highly conserved binding site residues in the previously characterised structure of IV HA with sialic acid suggests a similar binding site in our structures 12,34,46 .
The VE domain exists in all seven of our IVA HA structures, however its function in IVA and IVB is not well defined. In IVC, the similar 9-O-acetylesterase domain within HEF cleaves the sialic acid receptor on the host cell to release the virus and help viral budding, whereas NA is responsible for cell receptor cleavage in IVA and IVB 47 . Some studies show that the VE domain has antigenic properties, with several mouse (mAb) and human (hAb) antibodies, such as mAb H3v-47, mAb 1H5, mAb CR8071, mAb 100F4, mAb CR8071, hAb 46B8, and hAb 100F4, found to target this domain [48][49][50][51][52] . This suggests that the VE domain could be considered as a potential target for vaccine design, however the variability of this domain (as demonstrated in Fig. 3) may be an obstacle.
Recently, the HA stalk has become an interesting target for vaccine and drug design due to its conserved sequence and structure 16,34,53 . Within all seven of our IVA HA structures, the stalk is highly conserved, with reports that the stalk sequence is highly conserved both between and within IVA, IVB, and IVC. This allows some of the stalk-binding antibodies to bind and neutralise a variety of influenza strains 16 . hAbs CR6261 and F10 were found to neutralise most IVA group 1 strains by binding to the HA stalk 54,55 . Further, hAbs FI6v3 and CR9114 were found to bind IVA and both IVA and IVB, respectively 50,56 . These stalk-binding antibodies act by preventing conformational changes of HA, consequently hindering membrane fusion 50,[54][55][56] . This highlights the potential of the HA stalk in the development of universal influenza vaccines.
C05 is a well-characterised human antibody that employs a long heavy chain CDR3 for binding to the RBD in the HA head. This method of binding minimises the area contacting the hypervariable residues in the RBD. Previous studies report that C05 binds H3 subtype HA through the antibody heavy chain, using a single long loop that allows it to decrease the contact area to ~ 550 Å 2 , considerably smaller than the usual antibody-antigen contact (~ 650 and ~ 740 buried surface area on HA by CH65 and 2D1 antibodies, respectively). C05 was claimed to bind H1 and H2 subtype HAs via a similar mechanism 20 . Interestingly, our structure of the H1N1 influenza strain A/Denver/57 HA bound to the bnAb C05 Fab indicates that both the heavy and light chains contribute to binding. Further, the surface area buried by the C05 heavy chain CDR is ~ 786 Å 2 , which is close to that of the 2D1 antibody. Our other IVA HA structures all contain an insertion near the C05 Fab CDR3 binding site that appears to prevent antibody binding, via either insertion of an Asn which becomes glycosylated to physically block C05 binding, or insertion of an Arg, Lys, or Ile into the C05 binding pocket. Interestingly, these insertions do not affect the identity or the positioning of the sialic acid binding residues, allowing these strains to bind the cell surface receptors while eliminating nearby potential antibody recognition sites. Moreover, this same loop-insertion position was identified in H3N2 HA domains that were responsible for loss of C05 binding 20 . A recent study assayed the capacity of C05 to bind diverse HA molecules (not examined here) 45 . Examination of those HA sequences revealed that insertions in the binding pocket could predict the ability of C05 to bind and insertion of an amino acid between 146RG147 pushes a bulky residue side chain into the C05 binding interface. In both images, C05 heavy and light chain are coloured blue and yellow respectively, and 6ML8 HA is coloured magenta (note 6ML8 binds C05). In the left panel, 7JPD is coloured cyan (and does not bind C05 due to a glycosylation), and right panel, 6OSR, coloured orange is coloured orange, and has a loop insertion at the C05 interface. www.nature.com/scientificreports/ recognise, underpinning the importance of this region in determining C05 specificity. Although the insertion of either a glycosylated residue observed in the electron density of our structures (A/Netherlands/002P1/1951 and A/Fort Monmouth/1/1947) or of a bulky side chain which partially changes the loop conformation (A/Jiangsu/ ALSI/2011, A/Almaty/32/1998, A/Hickox/JY2/1940, and A/Melbourne/1/1946) appears consistent with the loss of C05 antibody Fab binding by analytical SEC, rigorous mutational analysis and quantitative binding experiments would be required to prove these are evasion mechanisms. It is possible, for example, that the glycosylation could be accommodated by movement of the positioning of the antibody as was the case for the S245N glycosylation site of H3N2 viral neuraminidases with IG01 antibodies (Ref PubMed ID 36543789).